fix: add thread safety and input validation#10
Merged
Conversation
- Add file locking (fcntl) to cache metadata operations for concurrent access - Add threading.Lock for in-memory cache metadata protection - Add file locking to usage tracking append and read operations - Add language parameter validation for OCR tool with VALID_LANGUAGES set - Add atomic metadata writes using temp file + rename pattern - Add comprehensive concurrent operation tests for cache and tracking Thread safety improvements: - Cache: _save_metadata() uses exclusive lock with atomic write - Cache: _load_metadata() uses shared lock for concurrent reads - Cache: All metadata modifications protected by threading.Lock - Tracking: record() uses exclusive lock for append operations - Tracking: get_records() uses shared lock for read operations Input validation: - OCR: Invalid language codes log warning and fall back to "eng" - OCR: VALID_LANGUAGES includes 28 common Tesseract language codes Fixes issues identified in QA review of PR #9 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
krisoye13
approved these changes
Feb 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses thread safety issues and adds input validation based on the QA review findings from PR #9.
Thread Safety Fixes
Cache Module (
src/document_analysis_mcp/cache/__init__.py):fcntlfile locking for cache metadata read/write operationsthreading.Lockfor in-memory metadata protection_remove_entry_unlocked()for use when lock is already heldTracking Module (
src/document_analysis_mcp/tracking/__init__.py):fcntlexclusive lock for JSON Lines append operationsInput Validation
OCR Tool (
src/document_analysis_mcp/tools/ocr.py):VALID_LANGUAGESfrozenset with 28 common Tesseract language codes_validate_language()function that logs warning and returns default for invalid codesTesting
Added comprehensive concurrent operation tests:
TestCacheThreadSafety: 4 tests for concurrent puts, gets, mixed operations, and cleanupTestTrackerThreadSafety: 3 tests for concurrent records, mixed read/write, and line integrityTestLanguageValidation: 6 tests for language validation behaviorChanges
src/document_analysis_mcp/cache/__init__.py- Thread safety with file and memory lockssrc/document_analysis_mcp/tracking/__init__.py- File locking for JSONL operationssrc/document_analysis_mcp/tools/ocr.py- Language validationtests/test_cache.py- Concurrent operation teststests/test_tracking.py- Concurrent operation teststests/test_ocr.py- Language validation testsTesting
All 223 tests pass:
Linting passes:
Notes
fcntlwhich is Unix-only. If cross-platform support is needed, considerportalockerpackage.🤖 Generated with Claude Code